329 research outputs found

    A performance comparison of feature extraction methods for sentiment analysis

    Get PDF
    Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text representations that are enriched with information in order to have better classification results. The experiment in this work aims to investigate the effects of applying different sets of features extracted and to discuss the behavior of the features in sentiment analysis. These features extraction methods include unigrams, bigrams, trigrams, Part-Of-Speech (POS) and Sentiwordnet methods. The unigrams, part-of-speech and Sentiwordnet features are word based features, whereas bigrams and trigrams are phrase-based features. From the results of the experiment obtained, phrase based features are more effective for sentiment analysis as the accuracies produced are much higher than word based features. This might be due to the fact that word based features disregards the sentence structure and sequence of original text and thus distorting the original meaning of the text. Bigrams and trigrams features retain some sequence of the sentences thus contributing to better representations of the text

    An optimized multi-layer ensemble framework for sentiment analysis

    Get PDF
    Public opinion plays an important role in decision making tasks of various fields. Sentiment Analysis is a key task in summarizing sentiment opinions as it classifies opinion documents according to its sentiment group of positive and negative. Machine learning based classification is efficient and versatile. The ensemble concept is used to improve classification accuracy by combining the decision of multiple classifiers. In this work, a framework for sentiment analysis is designed to extend the concept of ensemble upon all subtasks of machine learning classification in order to achieve better analysis. There are 3 subtasks in machine learning based sentiment analysis which are feature extraction, feature selection and classification. The ensemble concept is applied to all 3 tasks by combining different methods to perform the tasks and combine their results. optimization is performed by using Genetic Algorithm to find the combination of methods that could perform better. The proposed framework is tested on 4 different domain datasets and the sentiment analysis accuracy is shown to be very high. Future works includes testing the framework on different domains of classification and different optimization algorithm

    Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection

    Get PDF
    Sentiment Analysis is probably one of the best-known area in text mining. However, in recent years, as big data rose in popularity more areas of text classification are being explored. Perhaps the next task to catch on is emotion detection, the task of identifying emotions. This is because emotions are the finer grained information which could be extracted from opinions. So besides writer sentiments, writer emotion is also a valuable data. Emotion detection can be done using text, facial expressions, verbal communications and brain waves; however, the focus of this review is on text-based sentiment analysis and emotion detection. The internet has provided an avenue for the public to express their opinions easily. These expressions not only contain positive or negative sentiments, it contains emotions as well. These emotions can help in social behaviour analysis, decision and policy makings for companies and the country. Emotion detection can further support other tasks such as opinion mining and early depression detection. This review provides a comprehensive analysis of the shift in recent trends from text sentiment analysis to emotion detection and the challenges in these tasks. We summarize some of the recent works in the last five years and look at the methods they used. We also look at the models of emotion classes that are generally referenced. The trend of text-based emotion detection has shifted from the early keyword-based comparisons to machine learning and deep learning algorithms that provide more flexibility to the task and better performance

    Sentiment analysis based on probabilistic classifier techniques in various Indonesian review data

    Get PDF
    Sentiment analysis is the field in data science to achieve a broader holistic view of users’ needs and expectations. Indonesian user opinions have the potential to manage to be valuable information using sentiment-analysis tasks. One of the most supervised-learning techniques used in Indonesian sentiment analysis is the Naïve Bayes classifier. The classifier can be optimized and tuned in various models to increase the sentiment analysis model performance. This research aims to examine the performance of various Naïve Bayes models in sentiment analysis, especially when implemented in small datasets to handle overfitting problems. Four different Naïve Bayes models used are Gaussian, Multinomial, Complement and Bernoulli. We also analyze the effect of various pre-processing techniques on the models’ performance. Moreover, we build the first fashion dataset from the Indonesian marketplace which has a unique character compared to the datasets from other domains. Finally, we also use various datasets in the experiment to test the Naïve Bayes models' performance. From the experimental results, Complement Naïve Bayes is superior to other models, especially in handling overfitting with an F1-score of approximately 0.82

    SARS-CoV-2 Transmission in Alberta, British Columbia, and Ontario, Canada, December 25, 2019, to December 1, 2020

    Get PDF
    Objective: This study aimed to investigate coronavirus disease (COVID-19) epidemiology in Alberta, British Columbia, and Ontario, Canada. Methods: Using data through December 1, 2020, we estimated time-varying reproduction number, R t , using EpiEstim package in R, and calculated incidence rate ratios (IRR) across the 3 provinces. Results: In Ontario, 76% (92 745/121 745) of cases were in Toronto, Peel, York, Ottawa, and Durham; in Alberta, 82% (49 878/61 169) in Calgary and Edmonton; in British Columbia, 90% (31 142/34 699) in Fraser and Vancouver Coastal. Across 3 provinces, R t dropped to ≤ 1 after April. In Ontario, R t would remain \u3c 1 in April if congregate-setting-associated cases were excluded. Over summer, R t maintained \u3c 1 in Ontario, ~1 in British Columbia, and ~1 in Alberta, except early July when R t was \u3e 1. In all 3 provinces, R t was \u3e 1, reflecting surges in case count from September through November. Compared with British Columbia (684.2 cases per 100 000), Alberta (IRR = 2.0; 1399.3 cases per 100 000) and Ontario (IRR = 1.2; 835.8 cases per 100 000) had a higher cumulative case count per 100 000 population. Conclusions: Alberta and Ontario had a higher incidence rate than British Columbia, but R t trajectories were similar across all 3 provinces

    Predicting network traffic anomalies in Denial-of- service attacks – a nonlinear approach

    Get PDF
    The amount of data moving across the network at any given time is referred to as network traffic. It is the data units that are encapsulated in packets and sent over a network. Denial-of-Service (DDoS) attacks are various attempts to disrupt typical network, service, or server traffic. DDoS attacks attempt to disrupt legitimate users' work and data transfers by sending large packets or traffic. Various network traffic prediction techniques are investigated in this study, and a nonlinear time series method, Multilayer Perceptron Neural Network (MLPNN), has been chosen to evaluate network traffic prediction. The results with the NSL-KDD dataset show that the approach can improve prediction accuracy by up to 98.87%. With 2.26%, it outperforms other models such as Sequential Minimal Optimization (SMO)
    • …
    corecore